sum(c(2.2,4.1,2,pi))Tutoial - Part 2 TutoRial - Part 2
Marine Ecosystem Dynamics - 2025
Pipes
Pipes, expressed as %>% or |>, are very useful and make our code clearer. Using pipes, our data flow from one function to another.
Exercises
- Rewrite these chunks of code using the pipes
Solution
c(2.2,4.1,2,pi) |> sum()
# OR
c(2.2,4.1,2,pi) %>% sum()round(sum(c(2.2,4.1,2,pi)))
Solution
c(2.2,4.1,2,pi) |> sum() |> round()
# OR
c(2.2,4.1,2,pi) %>% sum() %>% round()round(sum(c(2.2,4.1,2,pi)), digits = 3)
Solution
c(2.2,4.1,2,pi) |> sum() |> round(digits = 3)
# OR
c(2.2,4.1,2,pi) %>% sum() %>% round(digits = 3)Tidy the data with tidyr
As seen in the slides, a tidy table has:
- Each variable in its own column
- Each observation in its own row
To reach this, tidyr has 4 key functions:
pivot_longerpivot_wideruniteseparate
Exercises
- If this is not done yet, download the dataset
zooplankton_seasonality.csv
Import the dataset in your environment
Is this dataset a tidy dataset?
Solution
| Month_abb | Year | Station | Coordinates | Group | Taxa | Biomass |
|---|---|---|---|---|---|---|
| Jan | 2009 | BY15 | 20.05000/57.33333 | Copepoda | Acartia | 6.650319 |
| Jan | 2009 | BY31 | 18.23333/58.58812 | Copepoda | Acartia | 1.816994 |
| Jan | 2009 | BY5 | 15.98333/55.25000 | Copepoda | Acartia | 5.562097 |
| Jan | 2009 | BY15 | 20.05000/57.33333 | Copepoda | Centropages | 5.738562 |
| Jan | 2009 | BY31 | 18.23333/58.58812 | Copepoda | Centropages | 1.228759 |
| Jan | 2009 | BY5 | 15.98333/55.25000 | Copepoda | Centropages | 14.405224 |
Each variable has its own column
Each variable has its own row
Coordinates has 2 values
- Separate the column
Coordinatesin 2 news columns:LongitudeandLatitude
Solution
library(tidyr)
zooplankton |>
separate(Coordinates, into = c("Longitude", "Latitude"), sep = "/")| Month_abb | Year | Station | Longitude | Latitude | Group | Taxa | Biomass |
|---|---|---|---|---|---|---|---|
| Jan | 2009 | BY15 | 20.05000 | 57.33333 | Copepoda | Acartia | 6.650319 |
| Jan | 2009 | BY31 | 18.23333 | 58.58812 | Copepoda | Acartia | 1.816994 |
| Jan | 2009 | BY5 | 15.98333 | 55.25000 | Copepoda | Acartia | 5.562097 |
| Jan | 2009 | BY15 | 20.05000 | 57.33333 | Copepoda | Centropages | 5.738562 |
| Jan | 2009 | BY31 | 18.23333 | 58.58812 | Copepoda | Centropages | 1.228759 |
| Jan | 2009 | BY5 | 15.98333 | 55.25000 | Copepoda | Centropages | 14.405224 |
- Combine the column
GroupandTaxainto a new columnGroup_Taxaand save the dataframe astidy_df
Solution
library(tidyr)
tidy_df <-
zooplankton |>
separate(Coordinates, into = c("Longitude", "Latitude"), sep = "/") |>
unite("Group_Taxa", c(Group, Taxa))| Month_abb | Year | Station | Longitude | Latitude | Group_Taxa | Biomass |
|---|---|---|---|---|---|---|
| Jan | 2009 | BY15 | 20.05000 | 57.33333 | Copepoda_Acartia | 6.650319 |
| Jan | 2009 | BY31 | 18.23333 | 58.58812 | Copepoda_Acartia | 1.816994 |
| Jan | 2009 | BY5 | 15.98333 | 55.25000 | Copepoda_Acartia | 5.562097 |
| Jan | 2009 | BY15 | 20.05000 | 57.33333 | Copepoda_Centropages | 5.738562 |
| Jan | 2009 | BY31 | 18.23333 | 58.58812 | Copepoda_Centropages | 1.228759 |
| Jan | 2009 | BY5 | 15.98333 | 55.25000 | Copepoda_Centropages | 14.405224 |
- Create a wide table with columns having the
Biomassvalues for eachGroup_Taxaand save the dataframe aswide_df
Solution
library(tidyr)
wide_df <-
tidy_df |>
pivot_wider(names_from = Group_Taxa, values_from = Biomass) | Month_abb | Year | Station | Longitude | Latitude | Copepoda_Acartia | Copepoda_Centropages | Copepoda_Pseudocalanus | Copepoda_Temora | Rotatoria_Synchaeta | Copepoda_Eurytemora | Rotatoria_Keratella | Cladocera_Bosmina | Cladocera_Evadne | Cladocera_Podon |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Jan | 2009 | BY15 | 20.05000 | 57.33333 | 6.650319 | 5.7385615 | 10.522882 | 9.725488 | 0.3921570 | NA | NA | NA | NA | NA |
| Jan | 2009 | BY31 | 18.23333 | 58.58812 | 1.816994 | 1.2287586 | 5.633984 | 4.993465 | 0.4705890 | NA | NA | NA | NA | NA |
| Jan | 2009 | BY5 | 15.98333 | 55.25000 | 5.562097 | 14.4052240 | 21.594775 | 45.738529 | 0.3921570 | NA | NA | NA | NA | NA |
| Jan | 2010 | BY15 | 20.05000 | 57.33333 | 2.467319 | 0.3071893 | 13.601301 | 7.549021 | 0.1568628 | NA | NA | NA | NA | NA |
| Jan | 2010 | BY31 | 18.23333 | 58.58812 | 2.248367 | 0.3856208 | 2.660128 | 8.418301 | 0.4117650 | 0.0849674 | NA | NA | NA | NA |
| Jan | 2011 | BY15 | 20.05000 | 57.33333 | 5.065367 | 2.9803908 | 49.660135 | 36.431384 | 0.5490210 | NA | NA | NA | NA | NA |
Data handling with dplyr
After finishing tidying the data, we often use the dplyr package to process our data.
Exercises
- What is the class of the
Yearcolumns of thetidy_dfdataframe?
If they are not numeric,mutatethem as numeric values.
Solution
str(tidy_df)
#> tibble [2,956 × 7] (S3: tbl_df/tbl/data.frame)
#> $ Month_abb : chr [1:2956] "Jan" "Jan" "Jan" "Jan" ...
#> $ Year : chr [1:2956] "2009" "2009" "2009" "2009" ...
#> $ Station : chr [1:2956] "BY15" "BY31" "BY5" "BY15" ...
#> $ Longitude : chr [1:2956] "20.05000" "18.23333" "15.98333" "20.05000" ...
#> $ Latitude : chr [1:2956] "57.33333" "58.58812" "55.25000" "57.33333" ...
#> $ Group_Taxa: chr [1:2956] "Copepoda_Acartia" "Copepoda_Acartia" "Copepoda_Acartia" "Copepoda_Centropages" ...
#> $ Biomass : num [1:2956] 6.65 1.82 5.56 5.74 1.23 ...Longitude and Latitude are characters
library(dplyr)
tidy_df |>
mutate(Year = as.numeric(Year))- Then, kepp all
Yearbetween2012and2015
Solution
library(dplyr)
tidy_df |>
mutate(Longitude = as.numeric(Longitude),
Latitude = as.numeric(Latitude)) |>
filter(Year %in% 2012:2015)- Then, only keep the data from the
StationBY31
Solution
library(dplyr)
tidy_df |>
mutate(Longitude = as.numeric(Longitude),
Latitude = as.numeric(Latitude)) |>
filter(Year %in% 2012:2015) |>
filter(Station == "BY31")
# OR
tidy_df |>
mutate(Longitude = as.numeric(Longitude),
Latitude = as.numeric(Latitude)) |>
filter(Year %in% 2012:2015,
Station == "BY31")- Then,
selectall columns exceptLongitudeandLatitude
Solution
library(dplyr)
tidy_df |>
mutate(Longitude = as.numeric(Longitude),
Latitude = as.numeric(Latitude)) |>
filter(Year %in% 2012:2015,
Station == "BY31") |>
select(-Longitude,
-Latitude)- Then,
renameMonth_abbasMonth
Solution
library(dplyr)
tidy_df |>
mutate(Longitude = as.numeric(Longitude),
Latitude = as.numeric(Latitude)) |>
filter(Year %in% 2012:2015,
Station == "BY31") |>
select(-Longitude,
-Latitude) |>
rename(Month = Month_abb)- Then,
group_by:MonthandGroup_Taxaand take theBiomassaverage and standard deviation and save the dataframe assummarized_df
Solution
library(dplyr)
summarised_df <-
tidy_df |>
mutate(Longitude = as.numeric(Longitude),
Latitude = as.numeric(Latitude)) |>
filter(Year %in% 2012:2015,
Station == "BY31") |>
select(-Longitude,
-Latitude) |>
rename(Month = Month_abb) |>
group_by(Month, Group_Taxa) |>
summarise(average = mean(Biomass),
standard_deviation = sd(Biomass))Ploting the data with ggplot2
In this part, we will build a plot step by step using the grammar of graphic in ggplot2
- Load the package and only keep the values for the copepod
Acartiafrom thesummarised_dfdataset in a new dataset calledacartia
Solution
library(ggplot2)
acartia <-
summarised_df |>
filter(Group_Taxa == "Copepoda_Acartia")- Initiate a ggplot with the dataset
acartiawith theMonthas the x-axis and theaveragebiomass as the y-axis
Solution
ggplot(data = acartia,
mapping = aes(x = Month, y = average))- Add a
barplotgeometry to the plot
Solution
ggplot(data = acartia,
mapping = aes(x = Month, y = average)) +
geom_bar(stat = "identity") # <- this is needed for the barplot geometry- Arrange the bar from the lowest to the highest values
Solution
ggplot(data = acartia,
mapping = aes(x = reorder(Month, average), y = average)) +
geom_bar(stat = "identity") # <- this is needed for the barplot geometry- Add a color filling in the bars according the Month
Solution
ggplot(data = acartia,
mapping = aes(x = reorder(Month, average), y = average, fill = Month)) +
geom_bar(stat = "identity") # <- this is needed for the barplot geometry- Change the axis as
BiomassandMonth, and add a title
Solution
ggplot(data = acartia,
mapping = aes(x = reorder(Month, average), y = average, fill = Month)) +
geom_bar(stat = "identity") + # <- this is needed for the barplot geometry
labs(x = "Month", y = "Biomass", title = "My nice ggplot")